canada <-read.csv("data/clean_data/24monthly_canada_freight.csv")canada <- canada[,c("Date","Value")]canada_ts <-ts(canada$Value, start =decimal_date(as.Date("2006-01-01", format ="%Y-%m-%d")), end =decimal_date(as.Date("2023-01-01", format ="%Y-%m-%d")), frequency =12)employment <-read.csv("data/clean_data/32monthly_employment.csv")employment <- employment[,c("Date","Transportation.Employment...Air.Transportation")]employment_ts <-ts(employment$Transportation.Employment...Air.Transportation, start =decimal_date(as.Date("2005-01-01", format ="%Y-%m-%d")), end =decimal_date(as.Date("2024-01-01", format ="%Y-%m-%d")), frequency =12)tsi <-read.csv("data/clean_data/35monthly_TSI.csv")tsi_ts <-ts(tsi$Transportation.Services.Index...Freight, start =decimal_date(as.Date("2000-01-01", format ="%Y-%m-%d")), end =decimal_date(as.Date("2023-01-01", format ="%Y-%m-%d")), frequency =12)air <-read.csv("data/clean_data/33revenue.csv")air <- air[air$Mode=="Air carrier, domestic",c("Year","Value")]air <- air[order(air$Year), ]air_ts <-ts(air$Value, start =decimal_date(as.Date("2000-01-01", format ="%Y-%m-%d")), end =decimal_date(as.Date("2021-01-01", format ="%Y-%m-%d")), frequency =1)ups <-getSymbols("UPS",auto.assign =FALSE, from ="2017-01-01", to ="2024-01-01",src="yahoo") ups=data.frame(ups)ups <-data.frame(ups,rownames(ups))colnames(ups)[7] ="date"ups$date<-as.Date(ups$date,"%Y-%m-%d")ups_ts <-ts(ups$UPS.Adjusted, start =decimal_date(as.Date("2017-01-03", format ="%Y-%m-%d")), frequency =365.25)
plot1<-ggAcf(canada_ts)+ggtitle("U.S.-Canada Freight Value ACF") +theme_bw()plot2<-ggPacf(canada_ts)+theme_bw()+ggtitle("U.S.-Canada Freight Value PACF")grid.arrange(plot1, plot2,nrow=2)
Based on the ACF and PACF plots, we observe strong correlations at the beginning lags, gradually decreasing but remaining relatively strong until lag 12 in the ACF plot. In contrast, the PACF plot also exhibits strong correlation at lag 1, with moderate correlation extending until lag 13. The presence of significant correlations in both plots suggests that the time series data is non-stationary.
Code
plot1<-ggAcf(employment_ts)+ggtitle("U.S. Air Transportation Employment ACF") +theme_bw()plot2<-ggPacf(employment_ts)+theme_bw()+ggtitle("U.S. Air Transportation Employment PACF")grid.arrange(plot1, plot2,nrow=2)
Based on the ACF plot, there is strong correlation observed from lag 1 to lag 16, with the correlation gradually decreasing but remaining relatively strong throughout. Similarly, the PACF plot shows strong correlation at lag 1 and lag 2. Given the presence of significant correlations in both plots, we can infer that the series is likely non-stationary.
Code
plot1<-ggAcf(tsi_ts)+ggtitle("U.S. Freight Transportation Services Index ACF") +theme_bw()plot2<-ggPacf(tsi_ts)+theme_bw()+ggtitle("U.S. Freight Transportation Services Index PACF")grid.arrange(plot1, plot2,nrow=2)
Based on the ACF plot, there is a strong correlation observed at lag 1, with the correlation slightly decreasing but remaining relatively strong until the end of the plot. Similarly, the PACF plot shows strong correlation at lag 1. Given the presence of significant correlations in both plots, we can infer that the series is likely non-stationary.
Code
plot1<-ggAcf(air_ts)+ggtitle("U.S. Domestic Air Carrier Average Freight Revenue ACF") +theme_bw()plot2<-ggPacf(air_ts)+theme_bw()+ggtitle("U.S. Domestic Air Carrier Average Freight Revenue PACF")grid.arrange(plot1, plot2,nrow=2)
Based on the description provided, the ACF plot demonstrates strong correlation at lag 1 and moderate correlation at lag 2 and lag 3. Additionally, the PACF plot exhibits strong correlation only at lag 1. Given the presence of significant correlations in both plots, particularly at lag 1, and moderate correlation at subsequent lags, we can infer that the series is likely non-stationary.
Based on the provided information, it appears that the ACF plot shows strong correlation beginning at lag 1 and then slightly decreasing but remaining strong until the end. Additionally, the PACF plot exhibits strong correlation only at lag 1. Given the presence of significant correlations in both plots, particularly at lag 1, and the sustained autocorrelation observed in the ACF plot, we can infer that the series is likely non-stationary.
Augmented Dickey-Fuller Test
data: canada_ts
Dickey-Fuller = -3.0511, Lag order = 5, p-value = 0.1356
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.
Code
tseries::adf.test(employment_ts)
Augmented Dickey-Fuller Test
data: employment_ts
Dickey-Fuller = -3.1311, Lag order = 6, p-value = 0.1008
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.
Code
tseries::adf.test(tsi_ts)
Augmented Dickey-Fuller Test
data: tsi_ts
Dickey-Fuller = -2.4166, Lag order = 6, p-value = 0.4005
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.
Code
tseries::adf.test(air_ts)
Augmented Dickey-Fuller Test
data: air_ts
Dickey-Fuller = -0.85187, Lag order = 2, p-value = 0.9424
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.
Code
tseries::adf.test(ups_ts)
Augmented Dickey-Fuller Test
data: ups_ts
Dickey-Fuller = -2.2106, Lag order = 12, p-value = 0.4892
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.
The differenced data exhibits greater stationarity compared to the originaldata. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the first order difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.
After applying a second-order difference, the ACF plot shows even stronger autocorrelation. The second differenced ACF plot shows that the data be over differenced. Consequently, I have opted to retain the first-order difference data, as it achieves a satisfactory level of stationarity.
The differenced data exhibits greater stationarity compared to the original data. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the first order difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.
The plot above clearly demonstrates that Second Order Differencing effectively renders the data stationary, which is a crucial prerequisite for accurate modeling.
From both the original plots and the ACF plots, it’s evident that the differenced data exhibits greater stationarity compared to the original data. The ACF plot of the differenced data shows a significant drop-off, indicating a lack of autocorrelation beyond those lags, which is characteristic of stationary data.
The differenced data exhibit greater stationarity compared to the original data. This improvement is evident in the ACF plot of the differenced data, which show a significant drop-off, indicating a lack of autocorrelation beyond those lags. This drop-off is characteristic of stationary data.
The differenced data exhibits greater stationarity compared to the original. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the First Order Difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.
After performing a second-order difference, strong negative autocorrelation is observed at lag 1 in the ACF plot. The second differenced ACF plot shows that the data be over differenced. Therefore, I have decided to retain the first-order difference data, as it achieves a satisfactory level of stationarity.
Augmented Dickey-Fuller Test
data: diff(canada_ts)
Dickey-Fuller = -6.549, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.
Code
tseries::adf.test(diff(diff(employment_ts)))
Augmented Dickey-Fuller Test
data: diff(diff(employment_ts))
Dickey-Fuller = -9.6886, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.
Code
tseries::adf.test(diff(tsi_ts))
Augmented Dickey-Fuller Test
data: diff(tsi_ts)
Dickey-Fuller = -5.025, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.
Code
tseries::adf.test(diff(air_ts))
Augmented Dickey-Fuller Test
data: diff(air_ts)
Dickey-Fuller = -2.1674, Lag order = 2, p-value = 0.5086
alternative hypothesis: stationary
With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion againest with the earlier analysis, where lack of autocorrelation was observed in the ACF and PACF plots, indicating the data almost stationary.
Code
tseries::adf.test(diff(ups_ts))
Augmented Dickey-Fuller Test
data: diff(ups_ts)
Dickey-Fuller = -11.498, Lag order = 12, p-value = 0.01
alternative hypothesis: stationary
With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.
ARIMA(p,d,q)
In this section, we will identify potential values for the parameters p and q based on the significant lags observed in the PACF and ACF plots of the original data, respectively. Specifically, we will consider the most significant lags from the PACF plot for the value of p, and from the ACF plot for the value of q.
## empty list to store model fitsARMA_res <-list()## set countercc <-1## loop over ARfor(p in0:2){## loop over MAfor(q in0:3){## loop over Ifor(d in0:2){ ARMA_res[[cc]]<-Arima(y=canada_ts,order =c(p,d,q), include.drift =TRUE) cc<- cc+1 } }}## get AIC values for model evaluationARMA_AIC<-sapply(ARMA_res,function(x) x$aic)ARMA_res[[which(ARMA_AIC ==min(ARMA_AIC))]]
The model with the lowest AIC and BIC is ARIMA(2,2,3).
p = 0,1,2, d = 0,1,2, q = 0,1,2,3
Code
## empty list to store model fitsARMA_res <-list()## set countercc <-1## loop over ARfor(p in0:2){## loop over MAfor(q in0:3){## loop over Ifor(d in0:2){ ARMA_res[[cc]]<-Arima(y=employment_ts,order =c(p,d,q), include.drift =TRUE) cc<- cc+1 } }}## get AIC values for model evaluationARMA_AIC<-sapply(ARMA_res,function(x) x$aic)ARMA_res[[which(ARMA_AIC ==min(ARMA_AIC))]]
The model with the lowest AIC and BIC is ARIMA(2,2,3).
p = 0,1, d = 0,1, q = 0,1,2,3
Code
## empty list to store model fitsARMA_res <-list()## set countercc <-1## loop over ARfor(p in0:1){## loop over MAfor(q in0:3){## loop over Ifor(d in0:1){ ARMA_res[[cc]]<-Arima(y=tsi_ts,order =c(p,d,q), include.drift =TRUE) cc<- cc+1 } }}## get AIC values for model evaluationARMA_AIC<-sapply(ARMA_res,function(x) x$aic)ARMA_res[[which(ARMA_AIC ==min(ARMA_AIC))]]
The model with the lowest AIC and BIC is ARIMA(0,1,3).
p = 0,1, d = 0,1, q = 0,1,2
Code
## empty list to store model fitsARMA_res <-list()## set countercc <-1## loop over ARfor(p in0:1){## loop over MAfor(q in0:2){## loop over Ifor(d in0:1){ ARMA_res[[cc]]<-Arima(y=air_ts,order =c(p,d,q), include.drift =TRUE) cc<- cc+1 } }}## get AIC values for model evaluationARMA_AIC<-sapply(ARMA_res,function(x) x$aic)ARMA_res[[which(ARMA_AIC ==min(ARMA_AIC))]]
The model with the lowest AIC and BIC is ARIMA(0,1,1).
p = 0,1, d = 0,1,2, q = 0,1,2
Code
## empty list to store model fitsARMA_res <-list()## set countercc <-1## loop over ARfor(p in0:1){## loop over MAfor(q in0:2){## loop over Ifor(d in0:2){ ARMA_res[[cc]]<-Arima(y=ups_ts,order =c(p,d,q), include.drift =TRUE) cc<- cc+1 } }}## get AIC values for model evaluationARMA_AIC<-sapply(ARMA_res,function(x) x$aic)ARMA_res[[which(ARMA_AIC ==min(ARMA_AIC))]]
Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. Despite these findings, it appears that the model has been improved.
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.
Coefficients:
Estimate SE t.value p.value
ma1 1.000 0.2724 3.6708 0.0016
constant -0.151 5.1467 -0.0293 0.9769
sigma^2 estimated as 145.6943 on 19 degrees of freedom
AIC = 8.252253 AICc = 8.283999 BIC = 8.40147
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.
Series: ups_ts
ARIMA(0,1,0) with drift
Coefficients:
drift
0.0366
s.e. 0.0557
sigma^2 = 5.465: log likelihood = -3989.16
AIC=7982.32 AICc=7982.33 BIC=7993.26
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 5.173998e-05 2.336453 1.51377 -0.01543305 1.161838 0.04821096
ACF1
Training set 0.03201884
Coefficients:
Estimate SE t.value p.value
constant 0.0366 0.0557 0.6569 0.5113
sigma^2 estimated as 5.462112 on 1758 degrees of freedom
AIC = 4.537987 AICc = 4.537988 BIC = 4.544209
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.
The best model provided by Auto ARIMA differs from the best model identified in the previous step. This discrepancy suggests that the Auto ARIMA model may offer a better fit compared to the previously chosen model, which was not optimal. It is plausible that this dataset exhibits strong seasonality, which ARIMA alone may struggle to capture adequately. Therefore, considering SARIMA models might be necessary to effectively model the seasonality present in the data.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
The best model provided by Auto ARIMA differs from the best model identified in the previous step. This discrepancy suggests that the Auto ARIMA model may offer a better fit compared to the previously chosen model, which was not optimal. It is plausible that this dataset exhibits strong seasonality, which ARIMA alone may struggle to capture adequately. Therefore, considering SARIMA models might be necessary to effectively model the seasonality present in the data.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
The blue line represents the forecast, while the two shades of purple depict the confidence intervals, with the darker shade representing the 95% interval and the lighter shade representing the 5% interval. As the confidence interval widens, indicating greater uncertainty in the forecast, the shaded region becomes wider.
Examining the forecast plot, there is an initial surge in the monthly U.S.-Canada freight value, followed by a subsequent decline and another increase. Comparing the observed increasing seasonal fluctuations post-2020 to the forecasted fluctuations, it appears that the model accurately predicted the anticipated pattern.
The forecast plot illustrates a subtle decreasing trend in air transportation employment over the next 12 months. The colored bands depict the 95% confidence interval. However, the forecast appears overly smooth and lacks fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the data.
The forecast plot indicates an ascending trend in the U.S. freight TSI over the ensuing 12 months. The colored bands delineate the 95% confidence interval. Nonetheless, the forecast appears excessively smooth, lacking the expected fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.
Code
pred_air <-forecast(ARMA_res_air, 5)autoplot(pred_air, xlab ="Date", ylab ="Freight revenue per ton-mile (current cents)", colour ="blue")+ggtitle('U.S. Domestic Air Carrier Average Freight Revenue Forecast') +theme_bw()
The forecast plot depicts a stagnant trend in the U.S. domestic air carrier average freight revenue over the upcoming five years. The colored bands represent the 95% confidence interval. However, the forecast appears excessively smooth, lacking any noticeable fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.
The forecast plot illustrates a rising trend in the UPS stock price over the forthcoming 50 days. The colored bands represent the 95% confidence interval. However, the forecast appears overly smooth, lacking noticeable fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the stock price data.
We observe that the ARIMA model best captures the trend, closely aligning with the observed data. The drift methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_canada)output2 <-accuracy(meanf(canada_ts, h=12))output3 <-accuracy(naive(canada_ts, h=12))output4 <-accuracy(rwf(canada_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("ARIMA", "Meanf", "Naive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
We observe that the ARIMA, drift and Naive models exhibit a comparable trajectory for the forecast and best captures the trend, closely aligning with the observed data. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_employment)output2 <-accuracy(meanf(employment_ts, h=12))output3 <-accuracy(naive(employment_ts, h=12))output4 <-accuracy(rwf(employment_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("ARIMA", "Meanf", "Naive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
We observe that the ARIMA model best captures the trend, closely aligning with the observed data. The drift methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_tsi)output2 <-accuracy(meanf(tsi_ts, h=12))output3 <-accuracy(naive(tsi_ts, h=12))output4 <-accuracy(rwf(tsi_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("ARIMA", "Meanf", "Naive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
Code
autoplot(air_ts, xlab ="Date", ylab ="Freight revenue per ton-mile (current cents)") +autolayer(meanf(air_ts, h=5),series="Mean", PI=FALSE) +autolayer(naive(air_ts, h=5),series="Naïve", PI=FALSE)+autolayer(rwf(air_ts, h=5, drift=TRUE),series="Drift", PI=FALSE)+autolayer(pred_air, series="ARIMA",PI=FALSE) +ggtitle("U.S. Domestic Air Carrier Average Freight Revenue ARIMA vs. Benchmarks")+guides(colour=guide_legend(title="Forecast")) +theme_bw()
We observe that the ARIMA model seems underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_air)output2 <-accuracy(meanf(air_ts, h=12))output3 <-accuracy(naive(air_ts, h=12))output4 <-accuracy(rwf(air_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("ARIMA", "Meanf", "Naive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
We observe that the ARIMA, drift and Naive models exhibit a comparable trajectory for the forecast and best captures the trend, closely aligning with the observed data. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_ups)output2 <-accuracy(meanf(ups_ts, h=12))output3 <-accuracy(naive(ups_ts, h=12))output4 <-accuracy(rwf(ups_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("ARIMA", "Meanf", "Naive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.
Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.
P: 1,2 D: 1 Q: 1
Code
tsi_ts %>%diff(lag=12) %>%diff() %>%ggtsdisplay()
Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.
Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.
P: 1 D: 1 Q: 1
Finding Model Parameters For Selected Models
In this section, we will identify potential values for the parameters p, q, P and Q based on the significant lags observed in the PACF and ACF plots of the original data, respectively. Specifically, we will consider the most significant lags from the PACF plot for the value of p,P, and from the ACF plot for the value of q,Q.
Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. All the P values are significant. Despite these findings, it appears that the model has been improved.
Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. All the P values are significant. Despite these findings, it appears that the model has been improved.
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.
Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals just one significant departures from the model assumptions. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. Despite these findings, it appears that the model has been improved.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.
fit_canada <-Arima(canada_ts, order=c(4,0,2), seasonal=c(0,1,0))# forecast next three yearsfit_canada %>%forecast(h=12) %>%autoplot(xlab ="Date", ylab ="Billions", colour ="blue")+ggtitle('U.S.-Canada Freight Value Forecast')+theme_bw()
The blue line represents the forecast, while the two shades of purple depict the confidence intervals, with the darker shade representing the 95% interval and the lighter shade representing the 5% interval. As the confidence interval widens, indicating greater uncertainty in the forecast, the shaded region becomes wider.
Examining the forecast plot, there is an initial surge in the monthly U.S.-Canada freight value, followed by a subsequent decline and another increase. Comparing the observed increasing seasonal fluctuations post-2020 to the forecasted fluctuations, it appears that the model accurately predicted the anticipated pattern.
Code
fit_employment <-Arima(employment_ts, order=c(2,1,3), seasonal=c(0,1,0))# forecast next three yearsfit_employment %>%forecast(h=12) %>%autoplot(xlab ="Date", ylab ="Employment", colour ="blue")+ggtitle('U.S. Air Transportation Employment Forecast') +theme_bw()
The forecast plot illustrates a subtle decreasing trend in air transportation employment over the next 12 months. The colored bands depict the 95% confidence interval. However, the forecast appears overly smooth and lacks fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the data.
Code
fit_tsi <-Arima(tsi_ts, order=c(1,1,3), seasonal=c(0,0,0))# forecast next three yearsfit_tsi %>%forecast(h=12) %>%autoplot( xlab ="Date", ylab ="Transportation Services Index", colour ="blue")+ggtitle('U.S. Freight Transportation Services Index Forecast') +theme_bw()
The forecast plot indicates an ascending trend in the U.S. freight TSI over the ensuing 12 months. The colored bands delineate the 95% confidence interval. Nonetheless, the forecast appears excessively smooth, lacking the expected fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.
The forecast plot depicts a rising trend in the UPS stock price over the next 50 days. The colored bands represent the 95% confidence interval. Notably, it closely follows the fluctuation pattern observed in the historical data. This alignment suggests that the model may effectively capture the inherent variability and fluctuations present in the stock price data.
We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(forecast(fit_canada,12))output2 <-accuracy(meanf(canada_ts, h=12))output3 <-accuracy(naive(canada_ts, h=12))output4 <-accuracy(snaive(canada_ts, h=12))output5 <-accuracy(rwf(canada_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4, output5)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the SARIMA model outperforms the benchmark models in terms of accuracy.
We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(forecast(fit_employment,12))output2 <-accuracy(meanf(employment_ts, h=12))output3 <-accuracy(naive(employment_ts, h=12))output4 <-accuracy(snaive(employment_ts, h=12))output5 <-accuracy(rwf(employment_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4, output5)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the Naive and drift model achieved the lowest RMSE, MAE, and MAPE values, followed by the SARIMA model. This suggests that the SARIMA model did not outperforms the benchmark models in terms of accuracy.
We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(forecast(fit_tsi,12))output2 <-accuracy(meanf(tsi_ts, h=12))output3 <-accuracy(naive(tsi_ts, h=12))output4 <-accuracy(snaive(tsi_ts, h=12))output5 <-accuracy(rwf(tsi_ts, drift=TRUE, h=12))output_list <-list(output1, output2, output3, output4, output5)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the SARIMA model outperforms the benchmark models in terms of accuracy.
We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.
Code
# Compute accuracyoutput1 <-accuracy(pred_ups)output2 <-accuracy(meanf(ups_ts, h=50))output3 <-accuracy(naive(ups_ts, h=50))output4 <-accuracy(snaive(ups_ts, h=50))output5 <-accuracy(rwf(ups_ts, drift=TRUE, h=50))output_list <-list(output1, output2, output3, output4, output5)# Put into dfoutput_df <-do.call(rbind, output_list)# Set row names for the dataframerow.names(output_df) <-c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")output_df
Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.
#a seasonal cross validation using 1 steps ahead forecastsfarima1 <-function(x, h){forecast(Arima(x, order=c(4,0,2),seasonal =c(0,1,0)),h=h)}# Compute cross-validated errors for up to 1 steps aheade1 <-tsCV(canada_ts, forecastfunction = farima1, h =1)mae1 <-abs(mean(e1,na.rm=TRUE))rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validationcat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 0.1707379
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 3.525084
Code
# Compute cross-validated errors for up to 12 steps aheade <-tsCV(canada_ts, forecastfunction = farima1, h =12)# Compute the AME and RMSE valuesmae <-abs(mean(e,na.rm=TRUE))rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validationcat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 0.6267446
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 7.491664
Code
#a seasonal cross validation using 1 steps ahead forecastsfarima1 <-function(x, h){forecast(Arima(x, order=c(2,1,3),seasonal =c(0,1,0)),h=h)}# Compute cross-validated errors for up to 1 steps aheade1 <-tsCV(employment_ts, forecastfunction = farima1, h =1)mae1 <-abs(mean(e1,na.rm=TRUE))rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validationcat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 262.7295
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 9779.526
Code
# Compute cross-validated errors for up to 12 steps aheade <-tsCV(employment_ts, forecastfunction = farima1, h =12)# Compute the AME and RMSE valuesmae <-abs(mean(e,na.rm=TRUE))rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validationcat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 940.5674
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 33433.24
Code
#a seasonal cross validation using 1 steps ahead forecastsfarima1 <-function(x, h){forecast(Arima(x, order=c(1,1,3),seasonal =c(0,0,0)),h=h)}# Compute cross-validated errors for up to 1 steps aheade1 <-tsCV(tsi_ts, forecastfunction = farima1, h =1)mae1 <-abs(mean(e1,na.rm=TRUE))rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validationcat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 0.1098334
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 1.548448
Code
# Compute cross-validated errors for up to 12 steps aheade <-tsCV(tsi_ts, forecastfunction = farima1, h =12)# Compute the AME and RMSE valuesmae <-abs(mean(e,na.rm=TRUE))rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validationcat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 0.8662205
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 3.856692
Code
#a seasonal cross validation using 1 steps ahead forecasts#farima1 <- function(x, h){forecast(Arima(x, order=c(0,1,0),seasonal = c(0,1,0)),h=h)}# Compute cross-validated errors for up to 1 steps ahead#e1 <- tsCV(ups_ts, forecastfunction = farima1, h = 1)#mae1 <-abs(mean(e1,na.rm=TRUE))#rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validation#cat("1-Step Forecast MAE:", mae1, "\n")#cat("1-Step Forecast RMSE:", rmse1, "\n")# Compute cross-validated errors for up to 365 steps ahead#e <- tsCV(ups_ts, forecastfunction = farima1, h = 365)# Compute the AME and RMSE values#mae <-abs(mean(e,na.rm=TRUE))#rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validation#cat("S-Step Forecast MAE:", mae, "\n")#cat("S-Step Forecast RMSE:", rmse, "\n")
The computation exceeds the runtime limit, preventing me from generating the output.